Main

Comparison of simple normalisation methods

Comparison of simple normalisation strategies employed. MA plots showing the changes in ER binding after 48 hours treatment with 100 nM fulvestrant. Three simple normalisation methods were applied to this data and compared to the raw count data. (A) Raw counts. (B) Reads Per Million (RPM) reads in peaks. (C) RPM aligned reads. (D) RPM total reads. Note that the highlighted peaks remain above zero under all three standard normalisations.

Comparison of simple normalisation strategies employed. MA plots showing the changes in ER binding after 48 hours treatment with 100 nM fulvestrant. Three simple normalisation methods were applied to this data and compared to the raw count data. (A) Raw counts. (B) Reads Per Million (RPM) reads in peaks. (C) RPM aligned reads. (D) RPM total reads. Note that the highlighted peaks remain above zero under all three standard normalisations.

MA Plot of H2Av normalization

MA plots showing ER binding before and after treatment with fulvestrant including matched Dm H2Av spike-in control.} (A) Reads corrected to total aligned reads showed the same off-centre peak density as observed in Figure 1. Putative unchanged ER binding sites are within the red triangle. (B) Overlaying the MA plot combining the changes in chromatin binding of Hs ER (black) and Dm H2Av (blue). Dm peaks overlay the off-centre peak density. (C) Utilising the Dm H2Av binding events as a ground truth for 0-fold change, a linear fit to the log-fold change is generated and the fit is applied to adjust the Hs ER binding events.

MA plots showing ER binding before and after treatment with fulvestrant including matched Dm H2Av spike-in control.} (A) Reads corrected to total aligned reads showed the same off-centre peak density as observed in Figure 1. Putative unchanged ER binding sites are within the red triangle. (B) Overlaying the MA plot combining the changes in chromatin binding of Hs ER (black) and Dm H2Av (blue). Dm peaks overlay the off-centre peak density. (C) Utilising the Dm H2Av binding events as a ground truth for 0-fold change, a linear fit to the log-fold change is generated and the fit is applied to adjust the Hs ER binding events.

RARA gene locus with CTCF Spike-in

Figure 3 can be viewed interactively on the USCS track.

MA Plot of CTCF Peaks

Comparison of the control regions used to normalise ER analysis before and after treatment. Dots highlighted in red are significant (FDR = 0.01). The CTCF peaks used for normalisation show no significant change in the number reads before and after treatment.

Comparison of the control regions used to normalise ER analysis before and after treatment. Dots highlighted in red are significant (FDR = 0.01). The CTCF peaks used for normalisation show no significant change in the number reads before and after treatment.

H2av With DiffBind

Comparison of DiffBind output before and after applying the corrected size factors from our pipeline generated from Drosophila spike-in control. (A) Analysis of ER binding before and after treatment with fulvestrant demonstrates that DiffBind’s default normalisation strategy is more effective than the DESeq2 default, but demonstrates a bias between samples. (B) Applying the correct size factors from our DESeq2 pipeline reduces the bias in the analysis (Data: SLX-8047).

Linear model

Comparison of mean counts in CTCF peaks before and after treatment. If the samples have no systematic bias before and after treatment then the linear fit would be expected to have a gradient of 1. Here, we establish that the gradient is < 1, implying a systematic bias between samples. The read counts in the treated samples peaks are corrected (blue), removing the bias, and resulting in a new gradient of 1.

Comparison of mean counts in CTCF peaks before and after treatment. If the samples have no systematic bias before and after treatment then the linear fit would be expected to have a gradient of 1. Here, we establish that the gradient is < 1, implying a systematic bias between samples. The read counts in the treated samples peaks are corrected (blue), removing the bias, and resulting in a new gradient of 1.

Comparison of CTCF and H2Av normalisation methods

Comparison of normalisation methods using consensus peak set. (A) The analysis for the CTCF normalised (blue) and H2Av normalised (green) dataset using an ER consensus peak set of 10,000 peaks were formatted as an MA plot and overlaid. This recovered the low-fold change higher-intensity peaks that were not visible in Figure 
ef{fig:ERCTCF}A and both datasets showed a similar distribution. (B) Comparison of fold-change values for individual ER binding sites between two datasets showed that the inclusion of these sites did not appear to affect the correlation (r = 0.77).

Comparison of normalisation methods using consensus peak set. (A) The analysis for the CTCF normalised (blue) and H2Av normalised (green) dataset using an ER consensus peak set of 10,000 peaks were formatted as an MA plot and overlaid. This recovered the low-fold change higher-intensity peaks that were not visible in Figure ef{fig:ERCTCF}A and both datasets showed a similar distribution. (B) Comparison of fold-change values for individual ER binding sites between two datasets showed that the inclusion of these sites did not appear to affect the correlation (r = 0.77).

Supplementry

Method comparision

Comparison of ChIP-seq Pipelines.} (A)ChIPComp data was plot from the CountSet object, results show a high number of false positive up-regulated sites. (B) EdgeR normalisation is designed for the analysis of transcriptional data. In case of large-scale uni-direction changes in binding the assumption of normalisation fail give rise distribution that is artificially symmetric.(C) DeSEQ2 makes use of similar assumptions and results in a similar distortion of data. (D) DiffBind utilises normalisation to total library size, and performs significantly better than the other three methods but does not attempt to control for systematic bias in pull-down efficiency of the ChIP.

Reproducibility plots

Correlation Plots of Replicate Experiments. (A) Scatter plots showing the correlation between the replicates with the lowest correlation value. This is provided both the control (top) and treatment (bottom) conditions. The plotted condition is highlighted with thick border in tables on the right. Colour represents density, blue = lowest, red = highest. (B) Tables showing the correlation coefficient for been each replicated.

Correlation Plots of Replicate Experiments. (A) Scatter plots showing the correlation between the replicates with the lowest correlation value. This is provided both the control (top) and treatment (bottom) conditions. The plotted condition is highlighted with thick border in tables on the right. Colour represents density, blue = lowest, red = highest. (B) Tables showing the correlation coefficient for been each replicated.

Venn diagrams

Venn diagrams showing peak overlap between replicates. (A) Peak overlap between samples for CTCF internal control. (B) Peak overlap for murine chromatin spike-in samples. (C) Peak overlap for Drosophila chromatin spike-in. As expected, samples treated with fulvestrant showed fewer ER peaks and a higher proportion of peaks unique to individual samples. This is indicative in a loss of ER binding leading and the resulting decrease in single to noise ratio. As we use a statistical analysis and model the variance of the peaks between replicates, false positives in the treated condition will be removed at the point of differential binding analysis.

MA Plots of mouse ER normalization

MA plots showing the addition of Mm derived chromatin spike-in to the ChIP-seq analysis of MCF-7 before and after treatment with fulvestrant. (A) MA plot after scaling factor based normalisation shows same characteristic grouping of peaks off axis. (B) ER binding in Mm samples shows considerable increase in binding after treatment of the MCF-7 cell line with fulvestrant. (C) Attempting to fit a correction factor to the data results in a significant distortion.

Relative reads aligments in mouse samples

Distribution of reads for Mm chromatin spike-in normalisation strategy. Comparison of murine chromatin between samples showed no systematic bias in the sample preparation. Bar plots (left axis) represent the fraction of total aligned reads. The dot plot represents the total aligned reads (right axis) for each sample.

Distribution of reads for Mm chromatin spike-in normalisation strategy. Comparison of murine chromatin between samples showed no systematic bias in the sample preparation. Bar plots (left axis) represent the fraction of total aligned reads. The dot plot represents the total aligned reads (right axis) for each sample.

MA plots of CTCF Parallel-Factor ChIP

MA plots showing ER binding before and after treatment with fulvestrant including matched CTCF control.} (A) Reads corrected to total aligned reads showed the same off-centre peak density as observed with all that was not-normalised with an internal spike-in control. (B) Overlaying the MA plot combining the changes in chromatin binding of ER (black) and CTCF (grey). CTCF peaks overlay the off-centre peak density. (C) Utilising the CTCF binding events as a ground truth for 0-fold change, a linear fit to the log-fold change is generated (blue line). The fit is then also applied to the ER binding events.

ER and CTCF heatmaps

Clustering of samples before and after ER and CTCF peak extractions shows the effect of fulvestrant on ER peaks drive clustering of the raw data.} To confirm that the effects seen at the RARa locus were consistent across the genome, we compared the clustering of the CTCF and the ER peaks with respect to the treatment with fulvestrant. Initial clustering was weakly correlated with that of the treatment condition (A). Clustering specifically to CTCF derived peak data (B) resulted in a loss of grouping by treatment, while clustering specifically ER-derived peak data (C) led to a clearer separation by treatment.

Normalisation using DESeq2 SizeFactors

Normalisation of ER binding external spike implemented using DESeq2. Highlighted data points are considered significant fold-changes with a FDR = 0.01. (A) Initial analysis of the ER binding with default parameters shows an equal increase and decrease in ER binding. The distribution seen is not reflective of the documented response of ER on treatment with fulvestrant. (B) Estimating the DESeq2 size factors from the sample spike-in corrects the distortion in the results.

Normalisation of ER binding internal CTCF control. Highlighted data points are considered significant fold-changes with a FDR = 0.01. (A) Initial analysis with default DESeq2 parameters gives similar distortion as seen previously. (B) Correction using the CTCF peaks to provide an internal control allows for the data to be corrected.

Comparison of mormalisation DiffBind plots

Comparison of DiffBind results before and after our two methods of normalisation. (A) Normalisation to Library Size. (B) Applying the corrected size factors from our DESeq2 pipeline generated from CTCF internal control. (C) Applying correction using linear regression of CTCF peaks between conditions to normalise data. The result is a 10.7% increase in the number of loci detected as significantly changed ER binding.

Cross normalisation

Comparison of fold-change of ER binding after both xenogenic and cross-normalisation. Pearson's correlation between the two methods is 0.992 (3sf, p-value tending to 0). Deviation of data points from parity is a result the integer nature of read counts, nonetheless this effect is is very small as demonstrated correlation coefficient between the two datasets.

Comparison of fold-change of ER binding after both xenogenic and cross-normalisation. Pearson’s correlation between the two methods is 0.992 (3sf, p-value tending to 0). Deviation of data points from parity is a result the integer nature of read counts, nonetheless this effect is is very small as demonstrated correlation coefficient between the two datasets.